向量图形文档呈现多个视觉元素,例如图像,形状和文本。对于业余爱好者和专业设计师来说,为多个视觉元素选择合适的颜色是一项艰巨但至关重要的任务。我们没有为所有元素创建单个调色板,而是从图形文档中的每个视觉元素中提取多个调色板,然后将它们组合成颜色序列。我们为颜色序列完成提出了一个掩盖的颜色模型,并建议基于多板的颜色上下文的指定颜色,概率很高。我们训练模型并在矢量图形文档的大规模数据集上构建颜色建议系统。提出的颜色建议方法通过定量和定性评估对颜色预测和我们的颜色推荐系统的表现优于其他最先进的方法,并在访谈研究中收到了专业设计师的积极反馈。
translated by 谷歌翻译
我们提出了一种新颖的场景表示,其编码达到距离 - 沿着可行轨迹的场景中的任何位置之间的距离。我们证明,该环境现场表示可以直接指导2D迷宫或3D室内场景中代理的动态行为。我们的环境领域是一种连续表示,通过使用离散采样的培训数据通过神经隐式功能学习。我们展示其在2D迷宫中的代理导航应用,3D室内环境中的人为轨迹预测。为了为人类生产物理似品和自然的轨迹,我们还学习了一种生成模型,该模型预测了人类通常出现的区域,并强制执行要在这些区域内定义的环境场。广泛的实验表明,所提出的方法可以有效准确地产生可行和合理的轨迹。
translated by 谷歌翻译
图形神经网络(GNNS)是将图形数据作为输入的深度学习模型,它们应用于各种任务,例如交通预测和分子特性预测。然而,由于GNN的复杂性,难以分析输入的哪些部分影响GNN模型的输出。在本研究中,我们扩展了卷积神经网络(CNNS)的解释方法,例如局部可解释模型 - 不可止结的解释(石灰),基于梯度的显着性图和梯度加权类激活映射(Grad-Cam)到GNN,以及预测输入图中的哪些边对于GNN决策很重要。实验结果表明,基于石灰的方法是最有效的解释性方法,用于多个任务中的现实情况,甚至在GNN解释性中表现出最先进的方法。
translated by 谷歌翻译
We present a physics-based inverse rendering method that learns the illumination, geometry, and materials of a scene from posed multi-view RGB images. To model the illumination of a scene, existing inverse rendering works either completely ignore the indirect illumination or model it by coarse approximations, leading to sub-optimal illumination, geometry, and material prediction of the scene. In this work, we propose a physics-based illumination model that explicitly traces the incoming indirect lights at each surface point based on interreflection, followed by estimating each identified indirect light through an efficient neural network. Furthermore, we utilize the Leibniz's integral rule to resolve non-differentiability in the proposed illumination model caused by one type of environment light -- the tangent lights. As a result, the proposed interreflection-aware illumination model can be learned end-to-end together with geometry and materials estimation. As a side product, our physics-based inverse rendering model also facilitates flexible and realistic material editing as well as relighting. Extensive experiments on both synthetic and real-world datasets demonstrate that the proposed method performs favorably against existing inverse rendering methods on novel view synthesis and inverse rendering.
translated by 谷歌翻译
以移动为中心的AI应用程序对模型推断的资源效率有很高的要求。输入过滤是消除冗余以降低推理成本的有前途的方法。以前的努力已经针对许多应用程序量身定制了有效解决方案,但是尚未解决两个基本问题:(1)推理工作量的理论滤波器可指导输入过滤技术的应用,从而避免了资源受限的移动应用程序的试用成本; (2)功能嵌入的可辨别性可允许输入过滤对各种推理任务和输入内容有效。为了回答它们,我们首先将输入过滤问题正式化,理论上比较了推理模型和输入过滤器的假设复杂性,以了解优化潜力。然后,我们提出了第一个端到端可学习的输入过滤框架,该框架涵盖了大多数最先进的方法,并以可强大的可区分性嵌入功能。我们设计和实施支持六种输入方式和多个以移动为中心的部署的INFI。综合评估证实了我们的理论结果,并表明INFI在适用性,准确性和效率方面的表现优于强大的基准。 INFI获得8.5倍的吞吐量并节省95%的带宽,同时保持超过90%的精度,以用于移动平台上的视频分析应用程序。
translated by 谷歌翻译
与2D对象检测不同,其中所有ROI功能来自网格像素,3D点云对象检测的ROI特征提取更加多样化。在本文中,我们首先比较和分析两个最先进模型PV-RCNN和Voxel-RCNN之间的结构和性能的差异。然后,我们发现两种模型之间的性能差距不来自点信息,而是结构信息。 Voxel特征包含更多结构信息,因为它们会进行量化而不是向下采样到点云,以便它们基本上可以包含整个点云的完整信息。体素特征中的强大结构信息使得探测器在我们的实验中具有更高的性能,即使体素功能没有准确的位置信息,也可以在我们的实验中进行更高的性能。然后,我们建议结构信息是3D对象检测的关键。基于上述结论,我们提出了一种自我关注的ROI特征提取器(SARFE),以增强从3D提案中提取的特征的结构信息。 SARFE是一种即插即用模块,可以轻松使用现有的3D探测器。我们的SARFE在Kitti DataSet和Waymo Open DataSet上进行评估。通过新引进的SARFE,我们通过在Kitti DataSet上的骑自行车者中的大型余量来提高最先进的3D探测器的性能,同时保持实时能力。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译